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In this paper we analyze two social network datasets of contemporary musicians constructed from 
allmusic.com (AMG), a music and artists' information database: one is the collaboration network in 
which two musicians are connected if they have performed in or produced an album together, and the 
other is the similarity network in which they are connected if they where musically similar according 
to music experts. We find that, while both networks exhibit typical features of social networks such 
as high transitivity, several key network features, such as degree as well as betweenness distributions 
suggest fundamental differences in music collaborations and music similarity networks are created. 

PACS numbers: 

I. INTRODUCTION 

Developments in computer and information technologies have allowed users to search for information they need 
on the Internet, rather than in the traditional arena of libraries or printed media. Particularly, developments in e- 
commerce technology have produced large commercial retailers of thousand of products serving millions of customers 
each day. E-comerce technology has reduced the cost of inventory storage and distribution, leading to what is known 
as the long-tail phenomenon This is related to the distribution of sales of a general item (books, CDs, DVDs, 
etc.), which generally decays with a power law distribution — a few items are sold in high volumes while most items 
suffer low sales volume. Therefore, by reducing the storage and distribution costs, it can be profitable for companies 
to concentrate on selling those less popular items, whose total amount can then overcome the incomes of a best-seller. 
Several websites, such as Amazon http://www.amazon.com , allow on-line users to access any product by navigating 
through a network of links between items. Besides the commercial impact of allowing low-sales items to gain visibility, 
this kind of networks are sources of information about product similarity, category structures, and so forth. 

In this paper, we analyze the topology of two social networks of contemporary popular musicians taken from 
the AUMusic database of music metadata "http:/ /www. allmusic.com'. The content on the database is created by 
professional data entry staff, editors and writers. We work with two datasets, that of the collaboration network and 
the similarity network of artists in the database. The networks were constructed as follows: two artists were connected 
in the collaboration network when they have worked on one or more albums together, while they were connected in 
the similarity network by music experts of AUMusic according to some criteria. 

There are several reasons that make these networks interesting. First, studying the collaboration network, formed 
^ ■ naturally by the actual professional acts of artists, may teach us how musical tendencies spread via formation of pro- 
. ^ , fession relationships between musicians, which could prove worthwhile for musicology. Second, studying the similarity 
network, which is a large-scale result of human experts' perception of music, may help in inventing recommendation 
5h [ systems in which machines are trained to perform the same task, and eventually help users discover music easier Q. 

We will see, both networks show typical characteristics of real-world networks such as high transitivity and the 
small- world property, as some other have shown j^lHI^' However, the discrepancies of the two sets, notably in the 
degrees and the betweenness centralities of same vertices, suggest a fundamental difference between the two networks. 



II. THE DATASETS 



On a typical artist's page on the allmusic.com database, we can find hyperlinks to other artists under various 
categories: "Similar Artists" , "Worked With" , "Followers" , etc. We can regard the existence of a link between two 
artists as having a tie in a social network. Using links in the category "Similar Artists" that had been created by 
the music experts of allmusic.com, we constructed the similarity network. Here, Mick Jagger of the Rolling Stones is 
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TABLE L Summary of several network characteristics of similarity, collaboration, and intersection subnetworks: number of 
vertices n, number of edges m, number of vertices in the largest component So and its percentage among all vertices, mean 
geodesic path d in So, diameter dmax of So, global clustering coefficient C, the highest-degree kmaxmax and the corresponding 
artist (s), and the artist with the highest betweenness. 



connected to Tina Turner or David Bowie. On the other hand, using links in another category "Worked With" , we 
constructed the collaboration network, where Mick J agger is now connected to other members of the Rolling Stones, 
such as Keith Richards or Charlie Watts, and others |14| . 

The similarity network is composed of 32,377 vertices (artists) and 117,621 edges, and the collaboration network 
is composed of 34, 724 vertices and 123, 082 edges. These two networks have 8, 509 vertices in common. These 
common vertices have 24, 950 edges in the similarity network, and 20, 232 edges for the collaboration network, between 
themselves. We can visualize this as Fig. ^ The two subnetworks defined on these common vertices will enable us to 
conduct a direct comparison study between similarity and collaboration link patterns. 



Collaboration Network Similarity Network 




FIG. 1: The structure of the data sets studied in this paper. Two sets of network data have an intersection consisting of 
common vertices. These common vertices and the edges between them (similarity or collaboration) comprise the subnetworks. 



III. BASIC NETWORK PROPERTIES 

In this section, we study several key properties of the networks, such as the degree distribution, transitivity, nearest- 
neighbor degree correlation, component structure and the Freeman centrality of vertices. They are summarized in 
Table m and Fig. [3 

A. Mean geodesic length, diameter and component structure 

A prominent feature of a complex network is the called the "small- world effect" 0| which means that the shortest 
paths (also called geodesies) between vertices is very small compared to the system size. The longest geodesic in the 
network is called its diameter. We see in Table |2 that average geodesic length d is smaller than 7, while the diameter 
is no larger than 23 in each network. 
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FIG. 2: The cumulative degree distribution P{k) (first row), the local clustering coeflicient C{k) (second row), the nearest- 
neighbor degree distribution fc„„ (third row), and the cumulative betweenness centrality distribution P{B) for the collaboration 
(circle) and similarity (diamond) networks. 

A component of a network is the set of vertices that are connected via one or more geodesies, and disconnected 
(i.e., no geodesies) from all other vertices. Typically, networks possess one large component that contains a majority 
of vertices. In TableQlwe see that with the exception of the collaboration subnetwork, each giant component contains 
^ 90% of the vertices. 



B. Degree distribution 

The number of vertices linked to a vertex is called its degree, usually denoted k. The degree distribution pk is the 
fraction of vertices in the system with degree k. Many real-world networks, including the Internet and the worldwide 
web (WWW), are known to show a right-skewed distribution, often a power law pk oc fc""^ with 2 < r < 3. More 
frequently, the cumulative degree distribution P{k) = '^k>=kPk' ^ the fraction of vertices having degree k or larger, 
is plotted. A cumulative plot avoids fluctuations at the tail of the distribution and facilitates the evaluation of the 
power coefficient t in case the network follows a power law. 

We see in Figure |21 that collaboration network exhibit power-law degree distributions near their tails, p{k) ~ k~^ , 
following a straight line in a log-log representation. We obtain a similar result when looking at its intersection 
subnetwork. 

On the other hand, the similarity network closely follows an exponential form of pk ~ exp— 0.12fc, while its 
intersection subnetwork follows pt ^ exp— O.lSfc. As such, there is a huge difference fcmax: artist R.E.M. and Eric 
Clapton are the most connected in the entire dataset and the subnetwork with degrees 131 and 55 respective, while 
in the collaboration dataset, Paulinho Da Costa tops in both cases with degrees 508 and 143 (tied with the legendary 
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recording engineer Rudy Van Gelder of the famed Blue Note label), several times larger than his counterparts in the 
similarity network. 



C. Transitivity 

Transitivity, or clustering, is an indication of how cliquish (tightly knit) a network is. It is quantified by the 
abundance of triangles in a network, where a triangle is formed when three vertices are all linked to one another. It 
can be quantified by the global clustering coefficient C, defined as 

^ 3 X number of triangles 
number of connected triples 

Here, a connected triple means a pair of vertices connected via another vertex. Since a triangle contains three triples, 
C is equal to the probability that two neighbors of a vertex is connected as well. Typical social networks have C of 
fractions of percent, and in Table HI that the music networks show values of 17%-18%. The reason that this indicates 
an abundance of triangles is that in the most random graph model of comparable size (n and m) , C is almost negligible 
— for example, with n = 34 724 and m = 123 082, a random graph has C = 0.02%. 

A closely related yet distinct measure is the local clustering coefficient d of each vertex i (defined for the case 
ki > 1) defined as 

^ number of connected pairs of neighbors of i 
number of pairs of neighbors oi i — ^ki{ki — 1) ' 

which is the fraction of pairs of neighbors of a vertex are connected. 

Often the local clustering is plotted as a function of degree k defined as the average of Ci over all vertices with a 
given degree k: 

= (3) 

Some real- world networks are known to show a behavior of C(fc) oc k~^, usually attributed to the hierarchical 
nature of networks ll]. In Fig.|21we have plotted the local C{k). We observe that C(fc) decreases as k^^ for range 
30 < fc < 300 for the collaboration network, but the decreasing pattern is not as clear for other data sets. 



D. Degree correlations 



We have also calculated the average nearest-neighbor degree /c"" as a function of k, 

oo 

fc"'^(fc) = k'p{k'\k), (4) 

k'=0 

where p{k'\k) is the fraction of edges that are attached to a vertex of degree k whose other ends are attached to vertex 
of degree fc'. Thus fc"" is the mean degree of the vertex we find by following a link emanating from a vertex of degree 
fc. 

The fc"" for our four datasets are plotted in Fig. |21 (third row). Here we see another difference between the two main 
networks [apart from that observed in P{k)]. While for the similarity network it is a nearly monotonic, increasing 
function, for the collaboration network it is not at all a simple form. The evolution of knn{k) is related with the 
assortativity of the network .9] , which indicates the tendency of a vertex of degree fc to associate with a vertex of the 
same fc. When fc,i,i(fc) is an increasing function of fc, which is the case of the similarity network (see Fig. [3 third 
row, central plot), the network is assortative. In other words, the most connected artists are prone to be similar to 
other top connected artists. On the other hand, we can observe that the collaboration network is rather noisy (Fig.|21 
third row, first plot). The first section of the fc„„(fc), for values up to 12 is assortative while the tail is not. The 
assortativeness for small values of fc could relate with band size in which all components obviously collaborate with 
all the others. The same reasoning does not apply for larger values. It could also be argued the assortativity observed 
in the similarity network is not a consequence of collaboration between artists. 

A closely-related concept is the degree-degree correlation coefficient r, which is the Pearson correlation coefficient 
for degrees of vertices at either end of a link: 
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r = 



r fcffcf" - (2m)-i fcf ] ^ ^ fc^fc""(fcK - z-^ [Efc fc V-] ' 



(5) 



where pk is the degree distribution, denotes sum over vertices and denotes sum over degrees. We can clearly 
see the connection between r and fc""(fc). In the case of a monotonically increasing (decreasing) k^"^(k) which means, 
as mentioned before, that high-degree vertices are connected to other high-degree (low-degree) vertices and vice versa, 
it results in a positive value of r, as in the case of the similarity network which has r = 0.184 (for the intersection 
portion of the network, r — 0.188). In other cases, however, we cannot read it off easily: for the entire collaboration 
network, r — —0.00575 while for its intersection portion r — 0.0372. The collaborative networks result, especially, is 
an interesting observation, since most social networks are known to show positive degree-degree correlation (as seen 
in the similarity network), and it is thought to be originating in part from the community structure |lO| . 



Given the inhomogeneity of link patterns around vertices in a complex network, we could certainly imagine that 
the position and roles of vertices will vary significantly from one vertex to another. Centrality, as its name suggests, 
is a concept that differentiates vertices according to how influential, or central, they are in a network. Degree is one 
kind of centrality, since it would be reasonable to assume that people with particularly many acquaintances can be 
looked as being important figures. However, degree is primarily local in scope (and talking loudly does not mean 
you are affecting others more effectively than somebody who speaks quietly but very eloquently, so to speak), and 
to overcome its shortcomings social scientists have in particular developed various measures of centrality. For our 
networks' dataset we choose to study the betweenness or Freeman centrality 4|- 

The idea behind this centrality measure is that a central vertex will act as a relay of information between vertices, 
a role endowed thanks to being on a geodesic between vertices (hence the name betweenness). Considering a vertex 
has a relay of information so that it has a "power to withhold information ... or to refuse to pass on requests for 
information" seems intuitively appropriate for communication networks systems, and recently has been studied on 
the Internet as well 0, 0| . 

The reason for choosing this centrality to study these networks was that we were interested in gaining a glimpse 
of how musical influences (considered as information) might spread via the complex network of artists. Especially, 
"crossover" musicians are becoming more common these days, and we were anticipating that those people who produce 
albums across genres were important in musical developments of the multiple genres, and by becoming bridges between 
genres, might have higher betweenness centrality. 

The definition of Freeman (betweenness) centrality Bi of a vertex I is defined as 



where gij is the total number of geodesies between vertices i and j, and guj is the number of the ones that pass 
through the vertex I. 

In Fig. 121 (fomth row) we have plotted the cumulative fraction Psik) of Freeman centralities for our datasets. We 
see that this distribution is highly skewed for both cases (with no differences at the subnetworks). Similar results 
were obtained by different authors 0,^3] in other kind of networks. In Tabled there is list of artists with the highest 
betweenness centrality in each data set. It is interesting to note that in the cases of similarity network data, the 
highest-degree vertex is not the highest-centrality vertex. We will discuss this point deeply in the next section. 

IV. COMPARISON OF SELF-ORGANIZED NETWORK AND ARTIFICIAL NETWORK 

An interesting question, as we have posed in the beginning of this paper, is how differently an individual is rep- 
resented in different types of networks. People belong to many spheres of social activity, and their relationship with 
the same people may well be different in each sphere. In fact, the two of our intersection data set seem to be quite 
different. Among the 24,950 and 20,232 edges belonging to the collaboration and similarity data respectively, there 
are only 464 common edges, so having worked together does not necessarily (practically not at all) translate into 
being classified as musically similar. 



E. The betweenness (Freeman) centrality 




1 



(6) 
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COLLABORATION 


Rank 


Artist 


rank in 
similarity network 


comments 


1 


Paulinho Da Costa 


2,933 


Percussionist 


2 


Jim Keltner 


5,468 


Percussionist 


3 


Ron Carter 


2,689 


Bassist 


4 


Rudy Van Gelder 


5,468 


Recording engineer 


5 


Dean Parks 


5,468 


Guitarist 


6 


Herbie Hancock 


299 


Jazz pianist 


7 


Randy Brecker 


4,073 


trumpetist and flugenhornist 


8 


Jim Horn 


4,517 


Saxophonist 


9 


Dann Huff 


3,620 


Guitarist 


10 


Tony Levin 


1,471 


Bassist 


SIMILARITY 


Rank 


Artist 


rank in 
collaboration network 


comments 


1 


Sting 


1,406 


singer, bassist 


2 


Joni Mitchell 


837 


singer, song writer 


3 


Eric Clapton 


23 


guitarist, singer 


4 


Quincy Jones 


46 


producer, trumpeter 


5 


Gil Evans 


396 


jazz pianist 


6 


Jimi Hendrix 


3,047 


guitarist, singer 


7 


M. DavisVC. Parker^ 


41 


trumpeter^, saxophonist^ 


8 


Aretha Franklin 


67 


singer 


9 


Lenny Kravitz 


2,446 


singer, songwriter 


10 


Jeff Beck 


463 


guitarist 



TABLE II: The ten top-ranked artists in betweenness in either of the intersection dataset, with their ranks in the other data 
set indicated. The two ranks are moderately correlated with Spearman coefficient 0.255. 

To see how different the individuals' roles are in these two networks, in Table ITU we have indicated the top ten high 
betweenness scorers from either network, along with their ranks in the other data set. The difference is evident. For 
example, Paulinho Da Costa, the prolific Brazilian percussionist, ranked first in the collaboration network, is ranked 
at merely 2,933'^ in the similarity network. On the other hand. Sting, ranked at the top in the similarity network, is 
ranked at 1, 406"^ in the collaboration network. In fact, none of the top ten artists in cither network is ranked as high 
in the other network. Quantitatively, the Spearman correlation of the two ranks is 0.255, indicating that the two are 
only slightly correlated. 

If we look at Table ITTI in more detail, we can see each vertex's characteristics and/or specialty in action. Artists with 
the largest betweenness in the collaboration data set are primarily instrumentalists (except for Rudy Van Gelder, a 
prolific recording engineer of the famed Blue Note and Verve labels, among many), and indeed all nine musicians are 
most famous for their virtuosity in the indicated instruments. They must have been invited to work in a multitude of 
recording sessions for various projects (in our data set, Paulinho da Costa has had 143 collaborators in the intersection 
data set, and 508 overall in the entire data set of collaborations), possibly bridging musicians of different styles to 
result in a high betweenness. However that did not necessarily translate into their perceived musical styles becoming 
as varied. A possible explanation for that is that some musicians adapt to the style of music that the recording artists 
requires. 

Considering the similarity network, it is remarkable to find an exponential decay in their degree distribution, since 
many social networks exhibit a power law . Nevertheless we must be very cautious since the similarity network has 
been designed by human perception (the opinion of experts). In this way, the evaluation of how similarity (i.e. musical 
tendencies) spreads will always be filtered by a subjective opinion, a fact that may cover (and filter) the real structure 
of the similarity network. In this sense, efforts have been made during the last years in order to obtain numerical 
algorithms to evaluate, in a rigorous and objective way, similarity between songs (and artists) Q. Nevertheless, how 
to capture music similarity as perceived by humans is still an open field. 

V. CONCLUSIONS 

In this paper, we have looked at various network properties of two types of music networks. One was the col- 
laboration relations among musicians which must have evolved naturally, and the other was the musical similarity 
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amongst them, which was entirely constructed via human perception of music. We have analyzed the structural 
properties of the networks, observing that both networks share small world properties together with a clustering co- 
efficient following a power law. The latter indicates the existence of a certain modularity that depends on the vertex 
degree (leading to a hierarchy). In this way, better connected artists form larger clusters than those of artists with 
less connections. Despite networks are constructed with artists as vertices and a certain connection between them 
(similarity/collaboration), we obtain different results, such as the degree distribution, which follows a power law in 
the collaboration network and has exponential decay in the similarity network. In addition, the Freeman centrality 
shows that vertices with highest betweenness are completely different at both networks, a fact that indicates that 
collaboration is not the mechanism for similarity spreading. Reciprocally, playing similar music is not an ingredient to 
predict collaboration links. It is indeed usual that artists collaborate with artists from a complete different style. The 
difference between the similarity and collaboration networks rules out the possibility of using collaboration data to 
infer music similarity. This would have proved convenient because collaboration data is easier to gather and definitely 
more objective. 
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